Estimating the Expected Error of Empirical Minimizers for Model Selection
نویسندگان
چکیده
Model selection [e.g., 1] is considered the problem of choosing a hypothesis language which provides an optimal balance between low empirical error and high structural complexity. In this Abstract, we discuss the intuition of a new, very efficient approach to model selection. Our approach is inherently Bayesian [e.g., 2], but instead of using priors on target functions or hypotheses, we talk about priors on error values which leads us to a new mathematical characterization of the expected true error. In the setting of classification learning, a learner is given a sample, drawn according to an unknown distribution of labeled instances, and returns the empirical minimizer (the hypothesis with the least empirical error) which has a certain (unknown) true error. If this process is carried out repeatedly, the true error of the empirical minimizer will vary from run to run as the empirical minimizer depends on the (randomly drawn) sample. This induces di stribution of true errors of empirical minimizers, over the possible samples drawn according to the unknown distribution. If this distribution would be known, one could easily derive the expected true error of the empirical minimizer of a model by integrating over this distribution. This would immediately lead to an optimal model selection algorithm: Enumerate the models, calculate the expected error of each model by integrating over the error distribution, and select the model with the least expected error. PAC theory [3] and the VC framework provide worst-case bounds on the chance of drawing a sample such that the true error of the minimizer exceeds some e "worst-case" meaning that they hold for any distribution of instances and any concept in a given class. By contrast, we focus on how to determine this distribution for a fixed, given learning problem (under some specified assumptions). Unlike the worst-case bound (which depends only on the size, or Copyright @1998, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. VC-dimension of the hypothesis space) the actual error distribution depends on the hypothesis space and the unknown distribution of labeled instances itself. However, we can prove that, under a certain assumption of independence of hypotheses, the distribution of true errors and hence the expected true error can be expressed as a function of the distribution of empirical errors of uniformly drawn hypotheses (which can be thought of as a prior on error values). The latter distribution (which is always one-dimensional) can be estimated from a fixed-sized initiM portion of the training data and a fixed-sized set of randomly drawn hypotheses. This estimate of the distribution now leads us to an estimate of the expected true error of the empirical minimizer of the model which, in turn, leads to a highly efficient model selection algorithm. We study the behavior of this approach in several controlled experiments. Our results show that the accuracy of the error estimate is at least comparable to the accuracy of the estimate obtained by 10-fold cross-validation provided the prior on error values can be estimated using at least 50 examples. But while 10-CV requires ten invocations of the learner per model, the time which our algorithm requires to assess each model is constant in the size of the model. We also study the robustness of our algorithm against violations of our independence assumptions. We can observe a bias in our predictions when the hypotheses space is of size four or less. When the hypothesis space is of size 40 or more, the dependencies are so diluted that the violations of our assumptions are negligible and do not incur a significant error. The full paper is available at http’//ki.cs.tuberlln.de/scheffer/papers/eed-report.ps.
منابع مشابه
Comparison of different empirical methods for estimating ddaily reference evapotranspiration in the humid cold climate (case study: Borujen, Shahrekord, Koohrang and Lordegan)
The proposed method for calculation of potential evapotranspiration is Penman-Monteith FAO method, but there are other methods that require less meteorological data but estimates close to the FAO Penman-Monteith method in different climatic conditions. Performance evaluation of these methods on the same basis is prerequisite for selecting an alternative approach in accordance with available da...
متن کاملOracle inequalities for computationally adaptive model selection
We analyze general model selection procedures using penalized empirical loss minimization under computational constraints. While classical model selection approaches do not consider computational aspects of performing model selection, we argue that any practical model selection procedure must not only trade off estimation and approximation error, but also the computational effort required to co...
متن کاملOracle inequalities for computationally budgeted model selection
We analyze general model selection procedures using penalized empirical loss minimization under computational constraints. While classical model selection approaches do not consider computational aspects of performing model selection, we argue that any practical model selection procedure must not only trade off estimation and approximation error, but also the effects of the computational effort...
متن کاملStability Properties of Empirical Risk Minimization over Donsker Classes
2 ) converges to zero in probability. Hence, even in the case of multiple minimizers of expected error, as n increases it becomes less and less likely that adding a sample (or a number of samples) to the training set will result in a large jump to a new hypothesis. Moreover, under some assumptions on the entropy of the class, along with an assumption of Komlos-Major-Tusnady type, we derive a po...
متن کاملEmpirical Performance Assessment of Nonlinear Model Selection Techniques
Estimating Prediction Risk is important for providing a way of computing the expected error for predictions made by a model, but it is also an important tool for model selection. This paper addresses an empirical comparison of model selection techniques based on the Prediction Risk estimation, with particular reference to the structure of nonlinear regularized neural networks. To measure the pe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998